Clustering Genes Using Heterogeneous Data Sources

نویسندگان

Erliang Zeng

Chengyong Yang

Tao Li

Giri Narasimhan

چکیده

Clustering of gene expression data is a standard exploratory technique used to identify closely related genes. Many other sources of data are also likely to be of great assistance in the analysis of gene expression data. This data provides a mean to begin elucidating the large-scale modular organization of the cell. The authors consider the challenging task of developing exploratory analytical techniques to deal with multiple complete and incomplete information sources. The Multi-Source Clustering (MSC) algorithm developed performs clustering with multiple, but complete, sources of data. To deal with incomplete data sources, the authors adopted the MPCK-means clustering algorithms to perform exploratory analysis on one complete source and other potentially incomplete sources provided in the form of constraints. This paper presents a new clustering algorithm MSC to perform exploratory analysis using two or more diverse but complete data sources, studies the effectiveness of constraints sets and robustness of the constrained clustering algorithm using multiple sources of incomplete biological data, and incorporates such incomplete data into constrained clustering algorithm in form of constraints sets.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incorporating heterogeneous biological data sources in clustering gene expression data

In this paper, a similarity measure between genes with protein-protein interactions is proposed. The chip-chip data are converted into the same form of gene expression data with pearson correlation as its similarity measure. On the basis of the similarity measures of proteinprotein interaction data and chip-chip data, the combined dissimilarity measure is defined. The combined distance measure ...

متن کامل

Clustering Heterogeneous Data Using Clustering by Compression

Nowadays, we have to deal with a large quantity of unstructured data, produced by a number of sources. The application of clustering on the World Wide Web is essential to getting structured information in response to user queries. In this paper, we intend to test the results of a new clustering technique – clustering by compression – when applied to heterogeneous sets of data. The clustering by...

متن کامل

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

Clustering Similar Schema Elements Across Heterogeneous Databases: A First Step in Database Integration

Interschema relationship identification (IRI), that is, determining the relationships among schema elements in heterogeneous data sources, is an important first step in integrating the data sources. This chapter proposes a cluster analysis-based approach to semi-automating the IRI process, which is typically very time-consuming and requires extensive human interaction. We apply multiple cluster...

متن کامل

Clustering Schema Elements for Semantic Integration of Heterogeneous Data Sources

Interschema relationship identification (IRI), that is, determining the relationships among schema elements in heterogeneous data sources, is an important step in integrating the data sources. This article proposes a cluster analysis based approach to semi-automating the IRI process, which is typically very time-consuming and requires extensive human interaction. The authors apply multiple clus...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IJKDB

دوره 1 شماره

صفحات -

تاریخ انتشار 2010

Clustering Genes Using Heterogeneous Data Sources

نویسندگان

چکیده

منابع مشابه

Incorporating heterogeneous biological data sources in clustering gene expression data

Clustering Heterogeneous Data Using Clustering by Compression

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Clustering Similar Schema Elements Across Heterogeneous Databases: A First Step in Database Integration

Clustering Schema Elements for Semantic Integration of Heterogeneous Data Sources

عنوان ژورنال:

اشتراک گذاری